Skip to content

[libclc] Implement __clc_rsqrt with __ocml_rsqrt_* functions #152436

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

wenju-he
Copy link
Contributor

@wenju-he wenju-he commented Aug 7, 2025

Motivation is to upstream use of _ocml_rsqrt in
https://github.com/intel/llvm/blob/sycl/libclc/libspirv/lib/amdgcn-amdhsa/math/rsqrt.cl

llvm-diff shows vectorized calls of llvm.sqrt.v2f32 and fdiv are scalarized: Old:
> %2 = tail call contract <2 x float> @llvm.sqrt.v2f32(<2 x float> %0), !fpmath !5
> %3 = fdiv contract <2 x float> splat (float 1.000000e+00), %2, !fpmath !4
!4 = !{float 2.500000e+00}
!5 = !{float 3.000000e+00}
New:
< %2 = extractelement <2 x float> %0, i64 0
< %3 = tail call float @__ocml_rsqrt_f32(float noundef %2)
< %4 = insertelement <2 x float> poison, float %3, i64 0
< %5 = extractelement <2 x float> %0, i64 1
< %6 = tail call float @__ocml_rsqrt_f32(float noundef %5)
< %7 = insertelement <2 x float> %4, float %6, i64 1

Motivation is to upstream use of __ocml_rsqrt_ in
https://github.com/intel/llvm/blob/sycl/libclc/libspirv/lib/amdgcn-amdhsa/math/rsqrt.cl

llvm-diff shows vectorized calls of llvm.sqrt.v2f32 and fdiv are scalarized:
Old:
    >   %2 = tail call contract <2 x float> @llvm.sqrt.v2f32(<2 x float> %0), !fpmath !5
    >   %3 = fdiv contract <2 x float> splat (float 1.000000e+00), %2, !fpmath !4
    !4 = !{float 2.500000e+00}
    !5 = !{float 3.000000e+00}
New:
    <   %2 = extractelement <2 x float> %0, i64 0
    <   %3 = tail call float @__ocml_rsqrt_f32(float noundef %2)
    <   %4 = insertelement <2 x float> poison, float %3, i64 0
    <   %5 = extractelement <2 x float> %0, i64 1
    <   %6 = tail call float @__ocml_rsqrt_f32(float noundef %5)
    <   %7 = insertelement <2 x float> %4, float %6, i64 1
@wenju-he wenju-he requested a review from frasercrmck August 7, 2025 05:03
@llvmbot llvmbot added the libclc libclc OpenCL library label Aug 7, 2025
@wenju-he wenju-he requested a review from arsenm August 7, 2025 05:04
@wenju-he
Copy link
Contributor Author

wenju-he commented Aug 7, 2025

@arsenm I'm not sure if the change is an improvement, please review.

Copy link

github-actions bot commented Aug 7, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions cl -- libclc/clc/lib/amdgcn/math/clc_rsqrt.cl
View the diff from clang-format here.
diff --git a/libclc/clc/lib/amdgcn/math/clc_rsqrt.cl b/libclc/clc/lib/amdgcn/math/clc_rsqrt.cl
index 4a9ae94b7..69164fff7 100644
--- a/libclc/clc/lib/amdgcn/math/clc_rsqrt.cl
+++ b/libclc/clc/lib/amdgcn/math/clc_rsqrt.cl
@@ -10,13 +10,17 @@
 
 float __ocml_rsqrt_f32(float);
 
-_CLC_OVERLOAD _CLC_DEF float __clc_rsqrt(float x) { return __ocml_rsqrt_f32(x); }
+_CLC_OVERLOAD _CLC_DEF float __clc_rsqrt(float x) {
+  return __ocml_rsqrt_f32(x);
+}
 
 #ifdef cl_khr_fp64
 #pragma OPENCL EXTENSION cl_khr_fp64 : enable
 double __ocml_rsqrt_f64(double);
 
-_CLC_OVERLOAD _CLC_DEF double __clc_rsqrt(double x) { return __ocml_rsqrt_f64(x); }
+_CLC_OVERLOAD _CLC_DEF double __clc_rsqrt(double x) {
+  return __ocml_rsqrt_f64(x);
+}
 
 #endif
 
@@ -24,9 +28,7 @@ _CLC_OVERLOAD _CLC_DEF double __clc_rsqrt(double x) { return __ocml_rsqrt_f64(x)
 #pragma OPENCL EXTENSION cl_khr_fp16 : enable
 half __ocml_rsqrt_f16(half);
 
-_CLC_OVERLOAD _CLC_DEF half __clc_rsqrt(half x) {
-  return __ocml_rsqrt_f16(x);
-}
+_CLC_OVERLOAD _CLC_DEF half __clc_rsqrt(half x) { return __ocml_rsqrt_f16(x); }
 
 #endif
 

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ocml rsqrt should be deleted, this is implementable by

float rsqrt(float x) {
    #pragma clang fp contract(fast)
    return 1.0f / __builtin_elementwise_sqrt(x);
}

@wenju-he
Copy link
Contributor Author

wenju-he commented Aug 7, 2025

ocml rsqrt should be deleted, this is implementable by

float rsqrt(float x) {
    #pragma clang fp contract(fast)
    return 1.0f / __builtin_elementwise_sqrt(x);
}

thanks @arsenm. So current generic implementation is good:

__clc_rsqrt(__CLC_GENTYPE val) {
#pragma clang fp contract(fast)
return __CLC_FP_LIT(1.0) / __builtin_elementwise_sqrt(val);
}

Close this PR.

@wenju-he wenju-he closed this Aug 7, 2025
@wenju-he wenju-he deleted the libclc-__clc_rsqrt-amdgcn branch August 7, 2025 05:14
uditagarwal97 pushed a commit to intel/llvm that referenced this pull request Aug 8, 2025
…#19722)

Before this PR, libspirv-nvptx64--nvidiacl uses
__nv_rsqrtf/isinf/copysign
in these two functions, and libspirv-amdgcn--amdhsa uses
__ocml_rsqrt/copysign.
We can upstream use of target built-in to CLC library:
Use of __nv_rsqrtf/isinf is to be upstreamed by
llvm/llvm-project#150174.
Then the changes to libspirv-nvptx64--nvidiacl should be minimized.

Generic implementation of __clc_rsqrt is better than __ocml_rsqrt
according to
llvm/llvm-project#152436.
Use of these target built-in scalarize llvm vector intrinsic. So it is
likely better to use generic implementation.
In addition, use of generic implementation aligns with OpenCL library.

Delete libspirv/generic/math/floatn.inc.
Delete unused clc/relational/floatn.inc and
libspirv/generic/math/minmag.inc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libclc libclc OpenCL library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants